Auditable Versioned Data Storage Outsourcing
نویسندگان
چکیده
Auditability is crucial for data outsourcing, facilitating accountability and identifying data loss or corruption incidents in a timely manner, reducing in turn the risks from such losses. In recent years, in synch with the growing trend of outsourcing, a lot of progress has been made in designing probabilistic (for efficiency) provable data possession (PDP) schemes. However, even the recent and advanced PDP solutions that do deal with dynamic data, do so in a limited manner, and for only the latest version of the data. A naive solution treating different versions in isolation would work, but leads to tremendous overheads, and is undesirable. In this paper, we present algorithms to achieve full persistence (all intermediate configurations are preserved and are modifiable) for an optimized skip list (known as FlexList) so that versioned data can be audited. The proposed scheme provides deduplication at the level of logical, variable sized blocks, such that only the altered parts of the different versions are kept, while the persistent data-structure facilitates access (read) of any arbitrary version with the same storage and process efficiency that state-of-the-art dynamic PDP solutions provide for only the current version, while commit (write) operations incur around 5% additional time. Furthermore, the time overhead for auditing arbitrary versions in addition to the latest version is imperceptible even on a low-end server. Additionally, the application of our approach opens up the possibility to naturally support block level deduplication. While a naive solution to audit versions would copy the whole data and the data structure for each version, our solution utilises storage space amounting very close to the Email addresses: [email protected] (Ertem Esiner), [email protected] (Anwitaman Datta) Preprint submitted to Future Generation Computer Systems August 3, 2015 ar X iv :1 50 7. 08 83 8v 1 [ cs .C R ] 3 1 Ju l 2 01 5 most efficient delta-based solutions. Accordingly, we explore how the proposed data structure benefits the system with block level deduplication besides adding auditability property, and how it can be integrated with a state-of-the-art versioning system (Git), and in the process scale the storage efficiency of Git, and thus help scale the size of data to be stored in Git, without compromising the retrieval efficiency of arbitrary versions.
منابع مشابه
Multi-versioned Data Storage and Iterative Processing in a Parallel Array Database Engine
Multi-versioned Data Storage and Iterative Processing in a Parallel Array Database Engine
متن کاملOptimal query/update tradeoffs in versioned dictionaries
External-memory dictionaries are a fundamental data structure in file systems and databases. Versioned (or fullypersistent) dictionaries have an associated version tree where queries can be performed at any version, updates can be performed on leaf versions, and any version can be ‘cloned’ by adding a child. Various query/update tradeoffs are known for unversioned dictionaries, many of them wit...
متن کاملCompressed Differential Erasure Codes for Efficient Archival of Versioned Data
In this paper, we study the problem of storing an archive of versioned data in a reliable and efficient manner in distributed storage systems. We propose a new storage technique called differential erasure coding (DEC) where the differences (deltas) between subsequent versions are stored rather than the whole objects, akin to a typical delta encoding technique. However, unlike delta encoding te...
متن کاملNonblocking Distributed Replication of Versioned Files
In this paper, we propose a distributed data storage framework that supports unrestricted offline access. The system does not explicitly distinguish between connected and disconnected states. Its design is based on a lock-free distributed framework that avoids update conflicts through file versioning. We propose an algorithm for replica synchronization. The feasibility of this framework is conf...
متن کاملDecibel: The Relational Dataset Branching System
As scientific endeavors and data analysis become increasingly collaborative, there is a need for data management systems that natively support the versioning or branching of datasets to enable concurrent analysis, cleaning, integration, manipulation, or curation of data across teams of individuals. Common practice for sharing and collaborating on datasets involves creating or storing multiple c...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- Future Generation Comp. Syst.
دوره 55 شماره
صفحات -
تاریخ انتشار 2016